TANGO is an instruction-guided diffusion model for text-to-audio generation, capable of producing realistic audio including human voices, animal sounds, and natural or artificial sound effects based on text prompts.
Audio Generation
Transformers English